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Abstract 

The paper studies average consensus with random topologies (intermittent links) and noisy channels. 
Consensus with noise in the network links leads to the bias-variance dilemma-running consensus for 
long reduces the bias of the final average estimate but increases its variance. We present two different 
compromises to this tradeoff: the A — NT) algorithm modifies conventional consensus by forcing the 
weights to satisfy a persistence condition (slowly decaying to zero;) and the A — NC algorithm where the 
weights are constant but consensus is run for a fixed number of iterations % then it is restarted and rerun 
for a total of p runs, and at the end averages the final states of the p runs (Monte Carlo averaging). We use 
controlled Markov processes and stochastic approximation arguments to prove almost sure convergence of 
A — NT) to the desired average (asymptotic unbiasedness) and compute explicitly the m.s.e. (variance) 
of the consensus limit. We show that A — NT) represents the best of both worlds-low bias and low 
variance-at the cost of a slow convergence rate; rescaling the weights balances the variance versus the 
rate of bias reduction (convergence rate). In contrast, A — NC, because of its constant weights, converges 
fast but presents a different bias-variance tradeoff. For the same number of iterations ip, shorter runs 
(smaller?) lead to high bias but smaller variance (larger number p of runs to average over.) For a static 
non-random network with Gaussian noise, we compute the optimal gain for A — NC to reach in the 
shortest run length T, with high probability (1 — S), (e, <5)-consensus (e residual bias). Our results hold 
under fairly general assumptions on the random link failures and communication noise. 
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I. Introduction 

Distributed computation in sensor networks is a well-studied field with an extensive body of literature 
(see, for example, [1] for early work.) Average consensus computes iteratively the global average of 
distributed data using local communications, see [2], [3], [4], [5] that consider versions and extensions 
of basic consensus. A review of the consensus literature is in [6]. Reference [7] designs the optimal 
link weights that optimize the convergence rate of the consensus algorithm when the connectivity graph 
of the network is fixed (not random). Our previous work, [8], [9], [10], [11], extends [7] by designing 
the topology, i.e., both the weights and the connectivity graph, under a variety of conditions, including 
random links and link communication costs, under a network communication budget constraint. 

We consider distributed average consensus when simultaneously the network topology is random (link 
failures, like when packets are lost in data networks) and the communications among sensors is commonly 
noisy. A typical example is time division multiplexing, where, in a particular user's time slot the channel 
may not be available, and, if available, we assume the communication is analog and noisy. Our approach 
can handle spatially correlated link failures through Markovian sequences of Laplacians and certain types 
of Markovian noise, which go beyond independently, identically distributed (i.i.d.) Laplacian matrices 
and i.i.d. communication noise sequences. Noisy consensus leads to a tradeoff between bias and variance. 
Running consensus longer reduces bias, i.e., the error between the desired average and the consensus 
reached. But, due to noise, the variance of the limiting consensus grows with longer runs. To address this 
dilemma, we consider two versions of consensus with link failures and noise that represent two different 
bias-variance tradeoffs: the A — MV and the A — MC algorithms. 

A — MT> updates each sensor state with a weighted fusion of its current neighbors' states (received 
distorted by noise). The fusion weights a(i) satisfy a persistence condition, decreasing to zero, but 
not too fast. A — NV falls under the purview of controlled Markov processes and we use stochastic 
approximation techniques to prove its almost sure (a.s.) consensus when the network is connected on the 
average: the sensor state vector sequence converges a.s. to the consensus sub space. A simple condition on 
the mean Laplacian, L = E {L}, for connectedness is on its second eigenvalue, A2 (L) > 0. We establish 
that the sensor states converge asymptotically a.s. to a finite random variable 6 and, in particular, the 
expected sensor states converge to the desired average r (asymptotic unbiasedness.) We determine the 
variance of 6, which is the mean square error (m.s.e.) between and the desired average. By properly 
tuning the weights sequence {a(i)}, the variance of 9 can be made arbitrarily small, though at a cost of 
slowing A — AAD's convergence rate, i.e., the rate at which the bias goes to zero. 

A — NC is a repeated averaging algorithm that performs in-network Monte-Carlo simulations: it runs 
consensus p times with constant weight a, for a fixed number of iterations T each time and then each 
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sensor averages its p values of the state at the final iteration T of each run. A — AAC's constant weight a 
speeds its convergence rate relative to A — AfV's, whose weights a(i) decrease to zero. We determine 
the number of iterations vp required to reach (e, 5)-consensus, i.e., for the bias of the consensus limit at 
each sensor to be smaller than Ke, with high probability (1 — 5). For non-random networks, we establish 
a tight upper bound on the minimizing vp and compute the corresponding optimal constant weight a. We 
quantify the tradeoff between the number of iterations T per Monte-Carlo run and the number of runs p. 

Finally, we compare the bias-variance tradeoffs between the two algorithms and the network parameters 
that determine their convergence rate and noise resilience. The fixed weight A — MC algorithm can 
converge faster but requires greater inter-sensor coordination than the A — MV algorithm. 

Comparison with existing literature. Random link failures and additive channel noise have been 
considered separately. Random link failures, but noiseless consensus, is in [11], [12], [13], [14], [15], 
[16]. References [11], [12], [13] assume an erasure model: the network links fail independently in space 
(independently of each other) and in time (link failure events are temporally independent.) Papers [14], 
[16] study directed topologies with only time i.i.d. link failures, but impose distributional assumptions 
on the link formation process. In [15] the link failures are i.i.d. Laplacian matrices, the graph is directed, 
and no distributional assumptions are made on the Laplacian matrices. The paper presents necessary and 
sufficient conditions for consensus using the ergodicity of products of stochastic matrices. 

Similarly, [17], [18], [19] consider consensus with additive noise, but fixed or static, non random 
topologies (no link failures.) They use a decreasing weight sequence to guarantee consensus. These 
references do not characterize the m.s.e. For example, [18], [19] rely on the existence of a unique solution 
to an algebraic Lyapunov equation. The more general problem of distributed estimation (of which average 
consensus is a special case) in the presence of additive noise is in [20], again with a fixed topology. 
Both [17], [20] assume a temporally white noise sequence, while our approach can accommodate a more 
general Markovian noise sequence, in addition to white noise processes. 

In summary, with respect to [1 1]— [20], our approach considers: i) random topologies and noisy 
communication links simultaneously; ii) spatially correlated (Markovian) dependent random link fail- 
ures; hi) time Markovian noise sequences; iv) undirected topologies; v) no distributional assumptions; 
vi) consensus (estimation being considered elsewhere;) and vii) two versions of consensus representing 
different compromises of bias versus variance. 

Briefly, the paper is as follows. Sections II and III summarize relevant spectral graph and average 
consensus results. Sections IV and V treat the additive noise with random link failure communication 
analyzing the A — NV and A — NC algorithms, respectively. Finally, Section VI concludes the paper. 
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II. Elementary Spectral Graph Theory 

We summarize briefly facts from spectral graph theory. For an undirected graph G = (V, E), V = 
[1 ■ ■ ■ N] is the set of nodes or vertices, \V\ = N, and E is the set of edges, \E\ = M. The unordered 
pair (n,l) G E if there exists an edge between nodes n and I. We only consider simple graphs, i.e., 
graphs devoid of self-loops and multiple edges. A path between nodes n and I of length m is a sequence 
(n = io,i\, ■ ■ ■ ,i m = I) of vertices, such that, (ik,ik+i) £ E, < k < m — 1. A graph is connected if 
there exists a path, between each pair of nodes. The neighborhood of node n is 

Sl n = {l€V\(n,l)€E} (1) 

Node n has degree d n = \VL n \ (number of edges with n as one end point.) The structure of the graph can 
be described by the symmetric N x N adjacency matrix, A = [A n i], A n \ = 1, if (n, I) G E, otherwise. 
Let the degree matrix be the diagonal matrix D = diag (di ■ ■ ■ djy). The graph Laplacian matrix, L, is 

L = D — A (2) 

The Laplacian is a positive semidefinite matrix; hence, its eigenvalues can be ordered as 

= A!(L)<A 2 (L)<---<A w (L) (3) 

The multiplicity of the zero eigenvalue equals the number of connected components of the network; for 
a connected graph, A2 (L) > 0. This second eigenvalue is the algebraic connectivity or the Fiedler value 
of the network; see [21], [22], [23] for detailed treatment of graphs and their spectral theory. 

III. Distributed Average Consensus with Imperfect Communication 
In a simple form, distributed average consensus computes the average r of the initial node data 

1 N 

71=1 

by local data exchanges among neighbors. For noiseless and unquantized data exchanges across the 
network links, the state of each node is updated iteratively by 

x n (i + 1) = w nn {i)x n (i) + ^ w nl(i)xi(i), l<n<N (5) 

ien n (i) 

where the link weights, w n i$, may be constant or time varying. Similarly, the topology of a time-varying 
network is captured by making the neighborhoods, f2 n 's, to be a function of time. Because noise causes 
consensus to diverge, [24], [10], we let the link weights to be the same across different network links, 
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but vary with time. Eq. (5) becomes 

x n (i + 1) = [1 - a(i)d n (i)] x n (i) + a(i) ^ xi(i), l<n<N (6) 

;en„(i) 

We address consensus with imperfect inter-sensor communication, where each sensor receives noise 
corrupted versions of its neighbors' states. Eq. (6) is now This leads to the state update given by 

x n (i + T) = [l-a(i)d n (i)]x n (i)+a(i) fnl,i[xi(i)} , l<n<N (7) 

ien n (i) 

where {f n i,i}i<n,i<N, i>o is a sequence of functions (possibly random) modeling the channel imperfec- 
tions. In the following sections, we analyze the consensus problem given by eqn. (7), when the channel 
communication is corrupted by additive noise. In [25], we consider the effects of quantization (see 
also [26] for a treatment of consensus algorithms with quantized communication.) Here, we study two 
different algorithms. The first, A — NV , considers a decreasing weight sequence (a(i) — > 0) and is 
analyzed in Section IV. The second, A — NC, uses repeated averaging with a constant link weight and 
is detailed in Section V. 

IV. A - NV: Consensus in Additive Noise and Random Link Failures 

We consider distributed consensus when the network links fail or become alive at random times, and 
data exchanges are corrupted by additive noise. The network topology varies randomly across iterations. 
We analyze the convergence properties of the A — NV algorithm under this generic scenario. We start 
by formalizing the assumptions underlying A — NV in the next Subsection. 

A. Problem Formulation and Assumptions 

We compute the average of the initial state x(0) = [#1(0) • • • xn(0)] T G R Arxl with the distributed con- 
sensus algorithm with communication channel imperfections given in eqn. (7). Let {f n «(*)}i<n,K-/v, j>o 
be a sequence of independent zero mean random variables. For additive noise, 

fni,i(y) = y + v n i(i) (8) 

Recall the Laplacian L defined in (2). Collecting the states x n (i) in the vector x(i), eqn. (7) is 

x(i + l) = x(t) - a(i) [L(i)x(i) + n(t)] (9) 
[n^], = m{i) = - 1 < ^ < > (10) 
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We now state the assumptions of the A — NV algorithm. 1 

1) Random Network Failure: We propose two models; the second is more general than the first. 

1.1) Temporally i.i.d. Laplacian Matrices: The graph Laplacians are 

L{i) =L + L(i), Vi > (11) 

where {L(i)}i>Q is a sequence of i.i.d. Laplacian matrices with mean L = E [£(«)], such that A2 (T) > 
0. We do not make any distributional assumptions on the link failure model, and, in fact, as long as the 
sequence {L(i)}j>o is independent with constant mean L, satisfying A2 (L) > 0, the i.i.d. assumption 
can be dropped. During the same iteration, the link failures can be spatially dependent, i.e., correlated 
across different edges of the network. This model subsumes the erasure network model, where the link 
failures are independent both over space and time. Wireless sensor networks motivate this model since 
interference among the sensors communication correlates the link failures over space, while over time, 
it is still reasonable to assume that the channels are memoryless or independent. 
Connectedness of the graph is an important issue. We do not require that the random instantiations G(i) 
of the graph be connected; in fact, it is possible to have all these instantiations to be disconnected. We 
only require that the graph stays connected on average. This is captured by requiring that A2 (£) > 
0, enabling us to capture a broad class of asynchronous communication models; for example, the 
random asynchronous gossip protocol analyzed in [28] satisfies A2 (L) > and hence falls under this 
framework. 

1.2) Temporally Markovian Laplacian Matrices: Our results hold when the Laplacian matrix se- 
quence {L(i, x(i))}j>o is state-dependent. More precisely, we assume that there exists a two-parameter 
random field, {L(i, x)}j> 0)X GR JVxl of Laplacian matrices such that 

E[L(»,x)] =L, Vi,x (12) 

and \2(L) > 0. We also require that, for a fixed i, the random matrices, {L(i, x)} x£ k»xi , are 
independent of the sigma algebra, a{x(j),0 < j < i). 2 It is clear then that the Laplacian matrix 
sequence, {L(i, x(z))}j>o, is Markov. We will show that our convergence analysis holds also for this 
general link failure model. Such a model may be appropriate in stochastic formation control scenarios, 
see [29], [30], [31], where the network topology is state-dependent. 

2) Communication Noise Model: We propose two models; the second is more general than the first. 

'See also [27], where parts of the results are presented. 

2 This guarantees that the Laplacian L(i, x(i)) may depend on the past state history {x(j), j < i}, only through the present 
state x(i). 
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2.1) Independent Noise Sequence: The additive noise {v n i{i)}\< n j<N, i>o is an independent sequence 

E[v n i(i)] =0, VI < n,l < N, i>0, supE [v 2 nl {i)] = n < oo (13) 

The sequences, {L(i)}i>o and {f n «(*)}i<n,«<Ar, i>o are mutually independent. Hence, {^(i)}, 
1 < n, / < AT, i > are independent of a (x(j), < j < i), Vi Then, from eqn. (10), 

E [n(i)] = 0, Vi, supE [||n(i)|| 2 ] = 7/ < iV(iV - l)/i < 00 (14) 

i 

No distributional assumptions are required on the noise sequence. 

2.2) Markovian Noise Sequence: Our approach allows the noise sequence to be Markovian through 
state-dependence. Let the two-parameter random field, {n(i, x)},>o, xg u«xi of random vectors 

E[n(i,x)] = 0, Vi,x (15) 

For fixed i, the random vectors, {n(i, x)} xG irjvxi , are independent of the u-algebra, a (x(j), < j < i) 
and the random families {L(i, x)} x6 g«xi and {n(i, x)} x£ r«xi are independent. It is clear then that the 
noise vector sequence, {n(i, x(i))}j>o, is Markov. Note, however, in this case the resulting Laplacian 
and noise sequences, {L(i, x(i))} i>0 and {n(i, x(i))} i>0 are no longer independent; they are coupled 
through the state x(i). In addition to (15), we require the variance of the noise component orthogonal 
to the consensus subspace (see eqn. (31)) to satisfy, for constants, c\,C2 > 0, 

E[||n c ^,x)|| 2 ] < Cl + c 2 ||x Ci || 2 (16) 

We do not restrict the variance growth rate of the noise component in the consensus subspace. This 
clearly subsumes the bounded noise variance model. An example of such noise is 

n(i,x(i))=0(i)(x(«)+w(i)) (17) 

where {$(i)}i>o and {w(i)}j>o are zero mean finite variance mutually i.i.d. sequences of scalars and 
vectors, respectively. It is then clear that the condition in eqn. (16) is satisfied, and the noise model 2.2) 
applies. The model in eqn. (17) arises, for example, in multipath effects in MIMO systems, when the 
channel adds multiplicative noise whose amplitude is proportional to the transmitted data. 
3) Persistence Condition: The weights decay to zero, but not too fast 

a(i) > 0, y^a(i) = 00, ^a 2 (i)<oo (18) 

i>0 i>0 
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This condition is commonly assumed in adaptive control and signal processing. Examples include 

<*(*) = 4» -5 < /? < 1 (19) 

For clarity, in the main body of the paper, we prove the results for the A — MV algorithm under 
Assumptions 1.1), 2.1), and 3). In the Appendix, we point out how to modify the proofs when the 
more general assumptions 1.2) and 2.2) hold. 

We now prove the almost sure (a.s.) convergence of the A — MV algorithm in eqn.(9) by using results 
from the theory of stochastic approximation algorithms, [32]. 

B. A Result on Convergence of Markov Processes 

A systematic and thorough treatment of stochastic approximation procedures has been given in [32]. 
In this section, we modify slightly a result from [32] and restate it as a theorem in a form relevant to 
our application. We follow the notation of [32], which we now introduce. 

Let X = {x(i)} i>0 be a Markov process on M 7Vxl . The generating operator £ of X is 

CV(i, x) = E [V (i + 1, x(i + 1)) |x(i) = x] - V{%, x) (20) 

for functions V(i,x), % > 0, x G M. Nxl , provided the conditional expectation exists. We say that 
V(i,x) 6 Dc in a domain A, if CV(i,x) is finite for all («,x) G A. 

Denote the Euclidean metric by p(-). For B C M Arxl , the e-neighborhood of B and its complement is 



U e (B) = x | inf p(x,y)<e\ (21) 
V t (B) = R Nxl \U € (B) (22) 

We now state the desired theorem, whose proof we sketch in the Appendix. 

Theorem 1 Let X be a Markov process with generating operator C. Let there exist a non-negative function 
V(i,x) G D c in the domain i > 0, x G M iVxl , and B C M Arxl with the following properties: 

1) inf V(i,x) > 0, Ve > (23) 

j>0,xGV £ (B) 

V(i,x) = 0, x G B (24) 
lim supV(i,x) = (25) 

i>0 

2) CV(i,x) < S (t)(l + 7(i 1 x))-a(tMi 1 x) (26) 



where <p(i,x.),i > 0, x G W is a non-negative function such that 

inf (p(i,x) > 0, Ve > (27) 

i,xev c (B) 

3) a(i) > 0, J^a(i) = cx) (28) 

i>0 

g(i) > 0, < oo (29) 

i>0 

Then, the Markov process X = {x(i)} i>0 with arbitrary initial distribution converges a.s. to B as i — > oo. 
In other words, 

P( lim p(x(i),B) = 0] = 1 (30) 

C. Proof of Convergence of the A — NV Algorithm 

The A — NV distributed consensus algorithm is given by eqn. (9) in Section IV-A. To establish its 
a.s. convergence using Theorem 1, define the consensus subspace, C, aligned with 1, the vectors of l's, 

C = {x <E R Nxl | x = al, a £ 1} (31) 

We recall a result on distance properties in R Nxl to be used in the sequel. We omit the proof. 

Lemma 2 Let S be a subspace of IR iVxl . For x G R Arxl , consider the orthogonal decomposition x = 
x 5 + x 5 i. Then p(x,5) = ||x 5 ±||. 

Theorem 3 (A — NV a.s. convergence) Let assumptions 1.1), 2.1), and 3) hold. Consider the A — AfT> 
consensus algorithm in eqn. (9) in Section IV-A with initial state x(0) G M Arxl . Then, 

lim p(x(i),C) =0=1 (32) 

Proof: Under the assumptions, the process X = {x(i)} i>0 is Markov. Define 

y(i,x)=x T Ix (33) 

The potential function V(i,x) is non-negative. Since x G C is an eigenvector of L with zero eigenvalue, 

V(i, x) = 0, x G C, lim sup V(i, x) = (34) 
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The second condition follows from the continuity of V(i, x). By Lemma 2 and the definition in eqn. (22) 
of the complement of the e-neighborhood of a set 



XGF e (C) => ||Xcx||>€ 



Hence, for x e V e (C), 



x T Lx 



(35) 



(36) 



> A 2 (L) ||x C x|| 2 

> A 2 (I)e 2 

Then, since by assumption 1.1) A 2 (£) > (note that the assumption A 2 (L) > comes into play here), 
we get 

inf V(i, x) > A 2 (L) e 2 > (37) 

i>o,xev e (c) v 7 



Now consider £V(i,x.) in eqn. (20). Using eqn. (9) in (20), we obtain 
£V(i, x) = E [x(i + l) T Lx(i + 1) | x(i) = x] - x T Lx 



(38) 



E 



x — a(i)Lx — a(i)L(i)x — a(i)n(z) L x — ct(i)Lx — a(i)L(i)x. — a(i)n(i) 



-x T Lx 



Using the independence of L(i) and n(i) with respect to x(i), that L(i) and n(i) are zero-mean, and 

3 ~- 

that the subspace C lies in the null space of L and L(i) (the latter because this is true for both L(i) 
and L), eqn. (38) leads successively to 



CV(i,x) = -2a(i)x T L 2 x + a 2 (i)x T L 3 x + E 



Q 2 (i) [L(i)^j T L (L(i)x) +E [a 2 (i)n(i) T Ln(i)] 

< -2a(i)x T I 2 x + a 2 (i)Ajv(I) 3 ||x C x|| 2 + a 2 (i)A7v(I)E [||Z(i)x|| 2 
+a 2 (z)A^(L)E [||n(,)|| 2 ] 

< -2Q(i)x T I 2 x + a 2 (i)Ajv(I) 3 ||x C x|| 2 + a 2 (i)Aiv(I)E A 2 ^ ||x C x|| 2 
+a 2 (i)AAr(L)r? 

< -2a(i)x T L 2 x + a 2 (i)A w (L) 3 ||x C x|| 2 + 4a 2 (i)iV 2 Ajv(I)||x C x|| 2 + a 2 (i)A7v(I)r/ (39) 
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The last step follows because all the eigenvalues of L(i) are less than 2N in absolute value, by the 
Gershgorin circle theorem. Now, by the fact x T Lx > A2 (L) ||xc-l|| 2 and \2(L) > 0, we have 



£V(i,x) < -2a(i)x T L 2 x + a 2 (i) 



Ai(L) T - 4N 2 X N (L) T - 
X N {L)r] + J ^7=f x-' Lx H ^=^-x i Lx 



A 2 (L) 



A 2 (L) 



< -a(iM*,x)+(/(i)[l + K(»,x)] 



where 



¥>(i,x) = 2x T L 2 x, = a 2 (i)max 



(40) 



(41) 



It is easy to see that x) and defined above satisfy the conditions for Theorem 1. Hence, 



lim p(x(i),C) = 



= 1 



(42) 



Theorem 3 assumes 1.1), 2.1), 3). For an equivalent statement under 1.2), 2.2), see Theorem 11 in the 
Appendix. 

Theorem 3 shows that the sample paths approach the consensus subspace, C, with probability 1 as 
% — ► 00. We now show that the sample paths, in fact, converge a.s. to a finite point in C. 

Theorem 4 (a.s. consensus: Limiting random variable) Let assumptions 1.1), 2.1), and 3) hold. Consider 
the A — NV consensus algorithm in eqn. (9) in Section IV-A with initial state x(0) G M Arxl . Then, there 
exists an almost sure finite real random variable 9 such that 



lim x(i) = 61 



= 1 



Proof: Denote the average of x(i) by 



The conclusions of Theorem 3 imply that 



lim ||x(i) — x avg (i)l|| = 



1 



(43) 



(44) 



(45) 



Recall the distributed average consensus algorithm in eqn. (9). Premultiplying both sides of eqn.(9) by 
-^1 T , we get the stochastic difference equation for the averages as 

x avg (i + l) = Zavg(*)-£W (46) 



= x 



avg 



(o) - £ m 

0<j<i 
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where 



N 



Given eqn. (14), in particular, the sequence {n(i)} is time independent, it follows that 



E = 0, Vt 



i>0 



i>0 



(47) 



< 



AT2 



i>0 



< oo 



which implies 



E 



(WO) <^ vg (0) + _/_£a 2 (j), V * 

i>o 



(48) 



Thus, the sequence {x avg (i)}i>o is an £2 bounded martingale and, hence, converges a.s. and in £2 to a 
finite random variable 9 (see, [33].) The theorem then follows from eqn. (45). ■ 
Again, we note that we obtain an equivalent statement under Assumptions 1.2), 2.1) in Theorem 12 in the 
Appendix. Proving under Assumption 2.2) requires more specific information about the mixing properties 
of the Markovian noise sequence, which due to space limitations is not addressed here. 

D. Mean Square Error 

By Theorem 4, the sensors reach consensus asymptotically and converge a.s. to a finite random variable 
9. Viewed as an estimate of the initial average r (see eqn. (4)), 9 should possess desirable properties like 
unbiasedness and small m.s.e. The next Lemma characterizes the desirable statistical properties of 9. 

Lemma 5 Let 9 be as in Theorem 4 and r as in eqn. (4). Define the m.s.e. ( as 



C = E (9-rY 



Then, the consensus limit 9 is unbiased and its m.s.e. bounded as below. 



E[0] = r 



N 2 



(49) 

(50) 
(51) 



i>0 



Proof: From eqn. (46), we have 



E [zavgW] = r, Vz 



(52) 
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Since {^avg(*)}j> converges in £2 to (Theorem 4), it also converges in C\, and we have 

E [6] = lim E [xavgW] = r (53) 



which proves (50). For (51), by Theorem 4, the sequence {(x avg (i) — r) 2 } i>Q converges in £2 to (6 — r) 2 
Hence, 



C = E (9-rf 



lim E (x avg (i) — r\ 
a 2 (i) 

AT2 



j>0 

Eqn. (51) follows from (54) and (48), with 77 the bound on the noise variance. ■ 
Lemma 13 in the Appendix shows equivalent results under Assumptions 1.2), 2.1). Proving under 
Assumption 2.2) requires more specific information about the mixing properties of the Markovian noise 
sequence, which we do not pursue here. 

Eqn. 54 gives the exact representation of the m.s.e. As a byproduct, we obtain the following corollary 
for an erasure network with identical link failure probabilities and i.i.d. channel noise. 

Corollary 6 Consider an erasure network with M realizable links, identical link failure probability, p, for 
each link, and the noise sequence, {f n z(i)}i<n,KiV, ?>o> be of identical variance a 2 . Then the m.s.e. is 

. 2Mo 2 {l-p) ^ 2 ,., 

Proof: Using the fact that the link failures and the channel noise are independent, we have 

E[||n(i)|| 2 ] =2a 2 M(l-p) (56) 

The result then follows from eqn. (54). ■ 
While interpreting the dependence of eqn. (55) on the number of nodes, N, it should be noted that the 
number of realizable links M must be Jl(iV) for the connectivity assumptions to hold. 

Lemma 5 shows that, for a given bound on the noise variance, 77, the m.s.e. ( can be made arbitrarily 
small by properly scaling the weight sequence, {a(j)}j>o- As an example, consider the weight sequence, 

Clearly, this choice of a(i) satisfies the persistence conditions of eqn. (28) and, in fact, 

j>o j>i 
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Then, for any e > 0, the scaled weight sequence, {a(j)} ?>0 > 

«(i) 



3'. 

6eN 



y/wU + 1) 

will guarantee that £ < e. However, reducing the m.s.e. by scaling the weights in this way will reduce 
the convergence rate of the algorithm; this trade-off is considered in the next Subsection. 

E. Convergence Rate 

The A — NV algorithm falls under the framework of stochastic approximation and, hence, a detailed 
convergence rate analysis can be done through the ODE method (see, for example, [34].) For clarity of 
presentation, we skip this detailed analysis here; rather, we present a simpler convergence rate analysis, 
involving the mean state vector sequence only under assumptions 1.1), 2.1), and 3). From the asymptotic 
unbiasedness of 6, it follows 

lim E [x(i)] = rl (57) 

Our goal is to determine the rate at which the sequence {E [x(i)]}j>o converges to rl. Since L(i) and 
x(i) are independent, and n(i) is zero mean, we have 

E [x(i + 1)] = (I - a(i)L) E [x(i)] , Vt (58) 

By the persistence condition (18) the sequence a(i) — ► 0. Without loss of generality, we can assume that 

a(i) < — j=-^ -=-, Mi (59) 

w - \ 2 {L) + \ N (LY 

Then, it can be shown that (see [11]) 

||E[x(i)]-rl|| < (l-«(i)A 2 (I)) I ||E[x(0)]-rl|| (60) 

\0<j<i-l / 
Now, since 1 — a < e~ a , < a < 1, we have 

||E[x(i)]-rl|| < ^ A2 ( r )( E ^<-i a(j) )) ||E[x(0)] - rl|| (61) 

Eqn. (61) shows that the rate of convergence of the mean consensus depends on the topology through the 
algebraic connectivity A2 {L) of the mean graph and through the weights a(i). It is interesting to note 
that the way the random link failures affect the convergence rate (at least for the mean state sequence) is 
through \2(L), the algebraic connectivity of the mean Laplacian, in the exponent, whereas, for a static 
network, this reduces to the algebraic connectivity of the static Laplacian L, recovering the results in [17]. 
Eqns. (61) and (51) show a tradeoff between the m.s.e. and the rate of convergence at which the 
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sequence {E [x(i)]} i>0 converges to rl. Eqn. (61) shows that this rate of convergence is closely related 
to the rate at which the weight sequence, a(i), sums to infinity. For a faster rate, we want the weights 
to sum up fast to infinity, i.e., the weights to be large. In contrast, eqns. (51) shows that, to achieve a 
small (, the weights should be small. 

We studied the trade-off between convergence rate and m.s.e. of the mean state vectors only. In 
general, more effective measures of convergence rate are appropriate; intuitively, the same trade-offs will 
be exhibited, in the sense that the rate of convergence will be closely related to the rate at which the 
weight sequence, a(i), sums to infinity, as verified by the numerical studies presented next. 

F. Numerical Studies -A — NV 

We present numerical studies on the A — NV algorithm that verify the analytical results. The first 
set of simulations confirms the a.s. consensus in Theorems 3 and 4. Consider an erasure network on 
N = 100 nodes and M = 5N realizable links, with identical probability of link failure p = A and 
identical channel noise variance a 2 = 15. We take a(i) = l/4i and plot on Fig. 1 on the left, the sample 
paths {x n (i)}i< n <N, of the sensors over an instantiation of the A — NV algorithm. We note that the 
sensor states converge to consensus, thus verifying our analytical results. 

The second set of simulations confirms the m.s.e. in Corollary 6. We consider the same erasure network, 
but take a 2 = 30 and a(i) = l/5i We simulate 50 runs of the A — NV algorithm from the initial state. 
Fig. 1 on the center plots the propagation of the squared error {x n {i) — r) 2 for a randomly chosen sensor 
n for each of the 50 runs. The cloud of (blue) lines denotes the 50 runs, whereas the extended dashed 
(red) line denotes the exact m.s.e. computed in Corollary 6. The paths are clustered around the exact 
m.s.e., thus verifying our results. 

The third set of simulations studies the trade-off between m.s.e. and convergence rate. We consider the 
same erasure network, but take a 2 = 50, and run the A — MV algorithm from the same initial conditions, 
but for the weight sequences {a s (i) = s/i}i>o, s = .33, .1. Fig. 1 on the right depicts the propagation of 
the squared error averaged over all the sensors, (1/N) J2n=i( x n(i) — r ) 2 > f° r eac h case. We see that the 
solid (blue) line (s = .33) decays much faster initially than the dotted (red) line (s = .1) and reaches a 
steady state. The dotted line ultimately crosses the solid line and continues to decay at a very slow rate, 
thus verifying the m.s.e. versus convergence rate trade-off, which we established rigorously by restricting 
attention to the mean state only. 

V. A - NC: Consensus with Repeated Averaging 

The stochastic approximation approach to the average consensus problem, discussed in Section IV, 
achieves arbitrarily small m.s.e., see eqn. (51), possibly at the cost of a lower convergence rate, especially, 
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Fig. 1. Left: Sensors sample paths, {x n (i)}i< n <N, of the A — MT> algorithm, verifying a.s. consensus. Center: Sample paths 
of the A — NT) algorithm, verifying the m.s.e. bound. Right: m.s.e versus convergence rate tradeoff. 



when the desired m.s.e. is small. This is mainly because the weights a(i)'s decrease to zero, slowing 
the convergence as time progresses, as discussed in Subsection IV-E. In this Section, we consider an 
alternative approach based on repeated averaging, which removes this difficulty. We use a constant link 
weight (or step size) and run the consensus iterations for a fixed number, T, of iterations. This procedure 
is repeated p times, each time starting from the same initial state x(0). Since the p final states obtained 
at iteration T of each of the p runs are independent, we average them and get the law of large numbers 
to work for us. There is an interesting tradeoff between ? and p for a constant total number of iterations 
vp. We describe and analyze this algorithm and consider its tradeoffs next. The Section is organized as 
follows. Subsection V-A sets up the problem, states the assumptions, and gives the algorithm A — MC 
for distributed average consensus with noisy communication links. We analyze the performance of the 
A — MC algorithm in Subsection V-B. In Subsection V-C, we present numerical studies and suggest 
generalizations in Subsection V-D. 

A. A — MC: Problem Formulation and Assumptions 

Again, we consider distributed consensus with communication channel imperfections in eqn. (7) to 
average the initial state, x(0) 6 J R 7Vxl . The setup is the same as in eqns. (8) to (10). 

A — MC Algorithm: The A — MC algorithm is the following Monte Carlo (MC) averaging procedure: 

x p (i + 1) = x p (i) - a (Lx p (i) + n p (i)) , < i < %- 1, 1 < p < p, x p (0) = x(0) (62) 

where Xn(i) is the state at sensor n at the i-th iteration of the p-th MC run. In particular, Xn(j) is the state 
at sensor n at the end of the p-th MC run. Each run of the A — MC algorithm proceeds for ? iterations 
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and there are pMC runs. Finally, the estimate, x«(T), of the average, x avg (0), at sensor n is 

<® = \jl<® (63) 

P =i 

We analyze the A — NC algorithm under the following assumptions. These make the analysis tractable, 
but, are not necessary. They can be substantially relaxed, as shown in Subsection V-D. 

1) Static Network: The Laplacian L is fixed (deterministic,) and the network is connected, \2{L) > 0. 
(In Subsection V-D we will allow random link failures.) 

2) Independent Gaussian Noise Sequence: The additive noise {v^i{i)}i< n ,i<N, i,p>o is an independent 
Gaussian sequence with 



E [*&(«)] =0. VI <n, 1<N, i,p>0, sup E [(^(i)) 2 

n,l,i,p 



H < oo (64) 



From eqn. (10), it then follows that 



IE [n p (i)\ = 0, Vi,p sup IE |nf = ^ ax < (N — l)fi < oo (65) 

i,l,p 

3) Constant link weight: The link weight a is constant across iterations and satisfies 

° <a< X^L) m 

Let r = -^l T x(0) be the initial average. To define a uniform convergence metric, assume that the 
initial sensor observations x(0) belong to the following set (for some K G [0, oo)): 

K. = (x(0) G R Nxl | ||x(0) - rl|| < K] (67) 

As performance metric for the A — AfC approach, we adopt the e — 5 averaging time, T Q (e, 5), given by 

T a (e,5)=i*p* (68) 

where 

(i*.n* \ = arp inf < Cvn\ I 

x(0)eC n \ K 



(i*,p*) = arg inf \ | inf inf P ( r l < e ] > 1 - <5 1 (69) 



and the superscript a denotes explicitly the dependence on the link weight a. 

We say that the A — NC algorithm achieves (e, 5)-consensus if T a (e, 5) is finite. A similar notion of 
averaging time has been used by others, see for example, [35], [36]. The next Subsection upper bounds 
the averaging time and analyzes the performance of A — NC. 
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B. Performance Analysis of A — NC 

The A — NC iterations can be rewritten as 

x p (i + 1) = Wx p (i) + X P (i), < i < T- 1, 1 < p < p, x p (0) = x(0) (70) 

where 

W = I-aL, X P {i) = ~an p (i), < i <i- 1, l<p<p (71) 

Also, for the choice of a given in eqn. (66), the following can be shown (see [7]) for the spectral norm 

J2 = p(w-±j\ = p (l-aL-±j) <1, J=11 T (72) 

where p(-) is the spectral radius, which is equal to the induced matrix 2 norm for symmetric matrices. 

We next develop an upper bound on the averaging time T a (e, 5) given in (68). Actually, we derive a 
general bound that holds for generic weight matrices W. This we do next, in Theorem 7. We come back 
in Theorem 10 to bounding the averaging time (68) for the model (70) when the weight matrix W is as 
in (71) and the spectral norm is given by (72). 

Theorem 7 (Averaging Time) Consider the distributed iterative procedure (70). The weight matrix W is 
generic, i.e., is not necessarily of the form (71). It does satisfy the following assumptions: 

1) Symmetric Weights: 

W = W T , Wl = 1, 72 = p(W - —J) < 1 (73) 

2) Noise Assumptions: The sequence {x p (i)}i>o,p>i is a sequence of independent Gaussian noise 
vectors with uncorrected components, such that 

E [x p (i)] = 0, sup sup E [(xf (i)) 2 ] < ^ nax < °° (74) 

i>0,p>ll<l<N 

Then, we have the following upper bound on the averaging time T 72 (e, S) given by eqn. (68): 

T 72 (e,<5) < f~ t2 (e,5) (75) 

k2 ^„ro/x\\ / w/o i 1 ^,2^2, 



(Ml + A \( ^HVS) \ ( , 1 1 - 7 2 2 ^/4 / _ l_\ \ 

V ln 72 ^ )[\ Ke ) \N\n l2 ^ N + 1 - 7 | V ^/ / 



(76) 



(Note we replace the superscript a by 72 because we prove results here for arbitrary W satisfying 
eqn. (73), not necessarily of the form / — aL.) 

For the proof we need a result from [10], which we state as a Lemma. 
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Lemma 8 Let the assumptions in Theorem 7 hold true. Define 



mi(t) = E [xf(T)] , vi(t) = E [(af (?) - m,(i)) 2 ] , 1 < p < p , 1 < Z < N (77) 

(Note that these quantities do not depend on p.) Then we have the following: 
1) Bound on error mean: 



N(T) ~r\< t!||x(0) - rl|| < ^K, Vx(0) € /C, 1 <l < N 



2) Bound on error variance: 



1 + 1^(1-1) 



, Vx(0) G R JVxl , 1 < I < iV 



(78) 



(79) 



3) {a^ (i)}p =1 is an i.i.d. sequence, with 



xf (T) = Af(mi(T), vi(T)), l<p<p 



(80) 



Proof: For proof, see [10]. ■ 
We now return to the proof of Theorem 7. 

Proof: [Theorem 7] The estimate, xf(T), of the average at each sensor after p runs is in eqn. (63). 
From Lemma 8 and a standard Chernoff type bound for Gaussian random variables (see [37]), then 



xf(T) - mi(i) 



K 



<e\ >l-2exp(^ 



(81) 



(For the present derivation we assume Gaussian noise; however, the analysis for other noise models may 
be done by using the corresponding large deviation rate functions.) 
For arbitrary e > 0, define 



?(e) = [ 



lne 

M72 



Also, for arbitrary e, 5 > 0, define 



2v t (i) 2 
P(e,5) = \—\n-] 



Then, we have from eqn. (78) in Lemma 8, 



K 

Also, we have from eqn. (81), 

/ jgg) -m;(?) 

if 



< e/2, VT>?(e/2), 1 < I < N 



<e/2 >l-<5, V p>p(e/2,<5), 1 < Z < N 



(82) 



(83) 



(84) 



(85) 
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From the triangle inequality, we get 

\xf{x)-r\ < \xf(T)-mi(i)\ + \mi{T)-r\, VI < I < N (86) 
It then follows from eqn. (84) that 



\x 



(?) - mi(T)\ < Ke/2 |xf(T) - r\ < Ke, Vi>i(e/2), 1 < I < N (87) 



We thus have for T > ?(e/2) and p > p(e/2, 5) 

P ^^" r| < e j > P^ tfg^M < e /2^ (88) 
> l-S, 1 < I < N 

where the last inequality follows from eqn. (85). 

From the definition of T 72 (e, 5), see eqn. (68), we then have 

T^(e,5) < i(e/2)p(e/2,6) (89) 

<- m 

< M 2 i 1^ ^ #Lxln(2/<5) \ (\ne/2 1 l- 72 2 6 2 /4 / 1 \ \ 1 

" V ln 72 7 L V Ke ) \N\n l2 + N + 1 - 7 | V W / . 

= f 72 (e,(5) (90) 

where the third inequality follows from eqn. (79). ■ 
We call the upper bound on T l2 (e,5) in (76) given in Theorem 7 the approximate averaging time. We 
use T 72 (e, 5) to characterize the convergence rate of A — NC. We state a property of T 72 (e, 5). 

Lemma 9 Recall the spectral radius, 72, defined in eqn. (73). Then, for < e, 5 < 1, T 72 (e, 5) is an 
increasing function of 72 in the interval < 72 < 1. 

Proof: The lemma follows by differentiating T 72 (e, 5) with respect to 72. ■ 
We now study the convergence properties of the A — NC algorithm, eqn. (62), i.e., when the weight 
matrix W is of the form (71) and the spectral norm satisfies (72). Then, the averaging time becomes a 
function of the weight a. We make this explicit by denoting as T a (e, 5) and T a (e, 5) the averaging time 
and the approximate averaging time, respectively. 



Theorem 10 (A — NC Averaging Time) Consider the A — NC algorithm for distributed averaging, under 
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the assumptions given in Subsection V-A. Define 



T*(e,5)= inf T a (e,5), T*(e,5) = inf T a (e,5) 



0<a< 



\ N (L) 



0<a< 



(91) 



\ N (L) 



Then: 
1) 



T a (e, 5) < oo, V < e, (5 < 1, < a < 



Ajv(L) 



(92) 



This essentially means that the A — NC algorithm is realizable for a in the interval ^0, j^j^j- 

2 



2) For < a < t-77T we have 



T a (e,5) < T a (e,5) 
f hie/2 



(93) 



T a (e,S) 



V. 1^72 



+ 1 



Ke 



N 



+ 1 



and 72 = p(I - aL - ± J). 

3) For a given choice of the pair < e, S < 1, the best achievable averaging time T*(e, <5) is bounded 
above by T*(e, <5), given by 



T*(e,<5) 



inf 



lne/2 



+ 1 



4a 2 </> 2 nax m(2/ ( 5)\ 



0<°< > 2W+ 2 W VMl-aA 2 (L)) 

1 l-(l-^/4 / _! 
AT 1- (l-aA 2 (L)) 2 V ^ 



lne/2 



Ke J \Nhx(l - a\ 2 {L)) 



(95) 



Note that in (95) the optimization is over a smaller range on a than in (91). 

Proof: The iterations for the A — NC algorithm are given by (70), where the weight matrix W is 
given by (71) and the spectral norm 72 by (72). Also, since x p (*) = -an p (i), we get 



SUp SUp E [(xf(«)) 2 ] < "Vmax < 00 



(96) 



*>0,p>l 1<1<N 

Then, the assumptions (73,74) in Theorem 7 are satisfied for a in the range (66) and the two items (92) 
and (94) follow. 
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To prove item 3), we note that it follows from 2) that T*(e,5) < T*(e,5), where 



T*(e,S) 



inf o T<*(e,6) 

0<a<- 



0<a< 



V m72 



2 j.2 



4a 2 0L x ln(2/<5)\ / lne/2 , 1 , 1 - 7 |e 2 /4 / 1 



K< ') V..Vlir 2 ' A 
and 72 = p (/ — aL — ^J). Now, consider the functions 



+ T7 + 



l-7l 



1 - 



N 



+ 1 



A e V ln72 



lne/2 , J_ l- 7 fe 2 /4 / _ J_ 



iVln72 TV I-72 



TV 



and 



(97) 



(98) 



with 72 as before. Similar to Lemma 9, we can show that, 5(72) and ^(72) are non-decreasing functions 
of 72. It can be shown that 72 = p (I — aL — jjJ) attains its minimum value at a = a* (see [7]), where 

2 



a 



X 2 (L) + \ N (L) 



We thus have 



N 



g[p[l-a'L--j)) <g[p[l-aL--J 



N 



and 



N 



hi pi I — a'L - — J ) <hlpll-aL-—J 



N 



a* < a < 



a' < a < 



\n(L) 



Xn(L) 



(99) 



(100) 



(101) 



which implies, that, for a* < a < 



A«(L)< 



1 



T a (e,5) = org [ p [ I — aL — — J ))+h[p[l — aL — — J 



N 



N 



(102) 



> (n*)"/y [p[I-a'L-±jX\+h(p(l- a'L - 1 J 



= T Q '(e,<5) 



So, there is no need to consider a > a*. This leads to 



T*(e,5) = inf, r a (e,<J) 

0<a< 



0<Q< 



inf, T«(e,«5) 



(103) 
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Also, it can be shown that (see [7]) 

•» = ,(/- at- ij)=l-aA,(L), 0<a<— ^ (104) 
This, together with eqn. (103) proves item 3). 

■ 

We comment on Theorem 10. For a given connected network, it gives explicitly the weight a for 
which the A — MC algorithm is realizable (i.e., the averaging time is finite for any (e, 5) pair.) For these 
choices of a, it provides an upper bound on the averaging time. The Theorem also addresses the problem 
of choosing the a that minimizes this upper bound. 3 There is, in general, no closed form solution for this 
problem and, as demonstrated by eqn. (95), even if a minimizer exists, its value depends on the actual 
choice of the pair (e, S). In other words, in general, there is no uniform minimizer of the averaging time. 

C. A — MC: Numerical Studies 

We present numerical studies on the A — NC algorithm. We consider a sensor network of N = 230 
nodes, with communication topology given by an LPS-II Ramanujan graph (see [10]), of degree 6. 4 For the 
first set of simulations, we take </>„ iax = 100, K = 50, and fix 5 at .05 (this guarantees that the estimate 
belongs to the e-ball with probability at least .95.) We vary e in steps, keeping the other parameters 
fixed, and compute the optimal (e, 8) averaging time, T*(e, 5), given by eqn. (95) and the corresponding 
optimum a*. Fig. 2 on the left plots T*(e, <5) as a function of e, while Fig. 2 on the right plots a* vs. 
e. As expected, T*(e,<5) decreases with increasing e. The behavior of a* is interesting. It shows that, 
to improve accuracy (small e), the link weight a should be chosen appropriately small, while, for lower 
accuracy, a can be increased, which speeds the algorithm. Also, as e becomes larger, a* increases to 
make the averaging time smaller, ultimately saturating at a', given by eqn. (99). This behavior is similar 
to the A — NV algorithm, where slower decreasing (smaller) weight sequences correspond to smaller 
asymptotic m.s.e. at a cost of lower convergence rate (increased accuracy.) 

In the second set of simulations, we study the tradeoff between the number of iterations per Monte- 
Carlo pass, % and the total number of passes, p. Define the quantities, as suggested by eqn. (89): 

»• = , „ +i cos) 

ln(l - a*\ 2 {L)) 

3 Note, that the minimizer, a* of the upper bound T a (e, 8) does not necessarily minimize the actual averaging time T a (e, 5). 
However, as will be demonstrated by simulation studies, the upper bound is tight and hence a* obtained by minimizing T a (e, 8) 
is a good enough design criterion. 

4 This is a 6-regular graph, i.e., all the nodes have degree 6. 
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Fig. 2. Left:Plot of T*(e,£) with varying e, keeping 5 fixed at .05. Right:Plot of a* with varying e, keeping 5 fixed at .05. 



P 



4a 



*2j,2 
max 



ln(2/<5)\ 



i*_ l-(l-a*A 2 (L)) 2 e 2 /4 
MiV + l-(l-a*A 2 (L)) 2 




(106) 



where a* is the minimizer in eqn. (95). In the following, we vary e and the channel noise variance max , 
taking K = 50, 5 = .05, and using the same communication network. In particular, in Fig. 3 (left) we plot 
(i*,p*) vs. e for (/> max = 10, while in Figs. 3 (center) and 3 (right), we repeat the same for <^ max = 30 and 
0max = respectively. The figures demonstrate an interesting trade-off between i* and p*, and show 
that for smaller values of the channel noise variance, the number of Monte-Carlo passes, p* are much 
smaller than to the number of iterations per pass, i* , as expected. As the channel noise variance grows, 
i.e., as the channel becomes more unreliable, we note, that p* increases to combat the noise accumulated 
at each pass. Finally, we present a numerical study to verify the tightness of the bound T*(e, 5). For this 
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Fig. 3. Plot of (i*,p*) with varying e. Lefti^ax = 10. Center:<?imax = 30. Right:0max = 100 



we consider a network of N = 100 nodes with M = 5N edges (generated according to an Erdos-Renyi 
graph, see [37].) We consider (/> max = 80, K = 50, and 6 = .05. To obtain T*(e,5) for varying e, we 
fix a, sample x(0) £ /C and for each such x(0), we generate 100 runs of the A — NC algorithm. We 
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check the condition in eqn. (69) and compute T a (e, 5). We repeat this experiment for varying a to obtain 
T*(e,S) = mf a T a (e,5) 5 . We also obtain f*(e,8) from eqn. (95). Fig. 4 on the left plots T*(e,5) (solid 
red line) and T*(e, 5) (dotted blue line) w.r.t. e, while Fig. 4 on the right plots the ratio jprfefj w.r.t. e. 
The plots show that the bound T*(e, 5) is reasonably tight, especially at small and large values of e, with 
fluctuations in between. Note that because the numerical values obtained for T*(e, S) are a lower bound, 
the bound T*(e, 5) is actually tighter than appears in the plots. 



— r,,.«: 




Fig. 4. Left: Plot of T*(e, 5), T*(e, 5) with varying e. Right:Plot of f'^f) with 

varying e. 



The bound T*(e, 5) is significant since: i) it is easy to compute from eqn. (95); ii) it is a reasonable 
approximation to the exact T*(e, 5); iii) it avoids the costly simulations involved in computing T*(e, 5) by 
Monte Carlo; and iv) it gives the right tradeoff between i* and p* (see eqns. (105-106)), thus determining 
the stopping criterion of the A — NC algorithm. 

D. A — NC: Generalizations 

In this Subsection, we suggest generalizations to the A — NC algorithm. For convenience of analysis, 
we assumed before a static network and Gaussian noise. These assumptions can be considerably weakened. 
For instance, the static network assumption may be replaced by a random link failure model with A2 (L) > 
0, where L = E [L]. Also, the independent noise sequence in eqn. (64) may be non-Gaussian. In this 
case, the A — NC algorithm iterations will take the form 

x p (z + 1) = (x p (i) - aL p (i)*P(i)) - an p {i), < i <?- 1, 1 < p < p , x p (0) = x(0) (107) 

5 The definition of T*(e, S) (see eqn. (69)) requires the infimum for all x(0) G /C, which is uncountable, so we verify eqn. (69) 
by sampling points from /C. The T*(e, S), thus obtained, is in fact, a lower bound for the actual T*(e, 8). 
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where, L p (i) is an i.i.d. sequence of Laplacian matrices with A2 (L) > 0. We then have, 



T-l 



x p © = ^ - aLP ^))j x (°) + hp ®i i <p<p (108) 

It is clear that {H p (*)} 1 < p <^ is an i.i.d. sequence of zero mean random variables. Under the assumption 
A2 (L) > 0, there exists a (see [11]) such that 



Y[(I-aL»(j)))x(0)=rl 



1, 1 < p < p (109) 



We thus choose? so that (n}=o (I ~ a ^ jP U))j x (0) * s sufficiently close to rl. The final estimate is 



1-1 



* ? (*) = iE II ( J - aLP ^)) x (°) + III HP ® 

P p=l \j=0 I P p =i 



(110) 



The first sum is close to rl by choice of?. We now choose p, large enough, so that the second term is 
close to zero. In this way, we can apply the A — NC algorithm to more general scenarios. 

The above argument guarantees that (e, 5)-consensus is achievable under these generic conditions, in 
the sense that the corresponding averaging time, T a (e,5), will be finite. A thorough analysis requires 
a reasonable computable upper bound like eqn. (94), followed by optimization over a to give the best 
achievable convergence rate. (Note a computable upper bound is required, because, as pointed earlier, it 
is very difficult to find stopping criterion using Monte-Carlo simulations, and the resulting T*(e, 5) will 
be a lower bound since the set K, is uncountable.) One way to proceed is to identify the appropriate large 
deviation rate (as suggested in eqn. (81).) However, the results will depend on the specific nature of the 
link failures and noise, which we do not pursue in this paper due to lack of space. 



VI. Conclusion 

We consider distributed average consensus when the topology is random (links may fail at random 
times) and the communication in the channels is corrupted by additive noise. Noisy consensus leads to 
a bias-variance dilemma. We considered two versions of consensus that lead to two compromises to this 
problem: i) A — NV fits the framework of stochastic approximation. It a.s. converges to the consensus 
subspace and to a consensus random variable #-an unbiased estimate of the desired average, whose 
variance we compute and bound; and ii) A — NC uses repeated averaging by Monte Carlo, achieving 
(e, 5)-consensus. In A — AfV the bias can be made arbitrarily small, but the rate at which it decreases 
can be traded for variance - trade-off between m.s.e. and convergence rate. A — NC uses a constant 
weight a and hence outperforms A — NV in terms of convergence rate. Computation-wise, A — NV is 
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superior since A — NC requires more inter-sensor coordination to execute the independent passes. The 
estimate obtained by A — NC does not possess the nice statistical properties, including unbiasedness, as 
the computation is terminated after a finite time in each pass. 

Finally, these algorithms may be applied to other problems in sensor networks with random links and 
noise, e.g., distributed load balancing in parallel processing or distributed network flow. 

Appendix 

Proof of Theorem 1 and A - NV generalizations under Assumptions 1.2) and 2.2) 

Proof: [Theorem 1] The proof follows that of Theorem 2.7.1 in [32]. Suffices to prove it for x(0) = 
xo a.s., xo G M Arxl is a deterministic starting state. Let the filtration {J 7 ^ = o {x(j) : < j < i}} i>0 
w.r.t. which {x(i)} i>0 , x G R Nxl ^and hence functionals of {x(i)} i>( ^j are adapted. 
Define the function W(i,x),i > 0, x G R Nxl as 

W(i,x) = (1 + V (t,x)) [J [1 + g(j)} (111) 

It can be shown that 

CW(i,x) < -a(i)(p(i,x), i>0,xGR ffxl (112) 
and, hence, under the assumptions (Theorem 2.5.1 in [32]) 

liminf /o(x(i),B) = J = 1 (113) 
which, together with assumption (25), implies 

liminfV (i,x(i)) = J = 1 (114) 



Also, it can be shown that the process (W(i, x(i)), is a non-negative supermartingale (Theorem 2.2.2 
in [32]) and, hence, converges a.s. to a finite value. It then follows from eqn. (Ill) that V(i,x(i)) also 
converges a.s. to a finite value. Together with eqn. (114), the a.s. convergence of V(i,x(i)) implies 



lim V(i,x(i)) =0=1 (115) 

The theorem then follows from assumptions (23) and (24) (see also Theorem 2.7.1 in [32].) ■ 

Theorem 11 (A — MV: Convergence) Consider the A — MV algorithm given in Section IV-A with ar- 
bitrary initial state x(0) G R JVxl , under the Assumptions 1.2), 2.2), 3). Then, 



lim p(x(i),C) = 



(116) 
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Proof: In the A — NV eqn. (9), the Laplacian L(z,x(i)) and the noise n(i,x(z)) are both state 
dependent. We follow Theorem 3 till eqn. (38) and modify eqn. (39) according to the new assumptions. 
The sequences {L(i, x)} and {n(i, x)} are independent. By the Gershgorin circle theorem, the eigenvalues 
of (L(z,x) — L) are less than 2N in magnitude. From the noise variance growth condition we have, 

E [a 2 (j)n T (i,x)In(i,x)] = a 2 (i)E [n T ± (i, x) Ln c ± (i, x)] 

< a 2 (i)\ N (L)E\\\n c ^(i,x)f 



< a 2 (i)X N (L) 



ci + c 2 ||x c .l| 



(117) 



Using eqn. (117) and a sequence of steps similar to eqn. (39) we have 



£V(«,x) = -2Q(i)x T L 2 x + a^(i)x T L J x 



a 



((L(i,x) - L) x) L((L(i,x) -L)x) +E a 2 (i)n (i, x) T Ln (i, x 



< -2a(i)x T L 2 x + a 2 (i)Aiv \\*c± || 2 + Aa 2 (i)N 2 \ N (L) ||x c . 



+Q 2 (i)A7V (L) 



Cl + c 2 ||x C i| 



Now, using the fact that x T Lx > A 2 (L) \\xc± || 2 , we have 



£V(i,x) < -2a(i)x T L 2 x + a 2 (i 



A 2 (^J 

4Af 2 A w (L) TT - c 2 Atv (I) j-=- ' 
H ._. x Lx H : — ; , x Lx 



A 2 (L) 



A 2 (I) 



< -<*(»)¥> (i,x)+s(»)[l + F(i,x)] 



where </?(i,x) = 2x T L 2 x, (7(2) = ct 2 (i)max (ciAat(L), 



A 3 W (L) 4jV 2 A w (L) c 2 A JV (L) 



a 2 (l) a 2 (l) 

that y(i,x) and g(i) satisfy the conditions for Theorem 1. Hence (116). 



+ 



It can be verified 



Theorem 12 Consider the A — MV algorithm under the Assumptions 1.2), 2.1), and 3). Then, there exists 
a.s. a finite real random variable 9 such that 



lim x(i) = 91 



= 1 



(118) 



Proof: Note that l T L(i,x(i)) = 0, VI The proof then follows from Theorem 4, since the noise 
assumptions are the same. ■ 



Lemma 13 Let 9 be as given in Theorem 12 and r, the initial average, as given in eqn. (4), under the 
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Assumptions 1.2), 2.1), and 3). Let the m.s.e. £ = E [9 — r] 2 . Then we have: 



1) Unbiasedness: 



E[0] = r 



(119) 



2) M.S.E. Bound: 



i>0 



(120) 



Proof: Follows from Theorem 12 and Lemma 5. 



It is possible to have results similar to Theorem 12 and Lemma 13 under Assumption 2.2) on the noise. In 
that case, we need exact mixing conditions on the sequence. Also, Assumption 2.2) places no restriction 
on the growth rate of the variance of the noise component in the consensus subspace. By Theorem 11, 
we still get a.s. consensus, but the m.s.e. may become unbounded, if no growth restrictions are imposed. 
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